Join Cost for Unit Selection Speech Synthesis

نویسندگان

  • Jithendra Vepa
  • Simon King
چکیده

In unit-selection speech synthesis systems, synthetic speech is produced by concatenating speech units selected from a large database, or inventory, which contains many instances of each speech unit with varied prosodic and spectral characteristics. Hence, by selecting an appropriate sequence of units, it is possible to synthesize highly natural-sounding speech. The selection of the best unit sequence from the database is typically treated as a search problem in which the best sequence of candidates from the inventory is the one that has the lowest overall cost [1]. This cost is often decomposed into two costs: a target cost (how closely candidate units in the inventory match the specification of the target phone sequence) and join cost (how well neighboring units can be joined) [1]. If, as is usually the case, the cost functions used to compute these costs take into account only properties of the fixed target sequence and local properties of the candidates, the optimal unit sequence can be found efficiently by a Viterbi search for the lowest cost path through the lattice of the target and join costs. In this chapter we focus on the calculation of the join cost (also known as concatenation cost). The ideal join cost is one that, although based solely on measurable properties of the candidate units—such as spectral parameters, amplitude, and F0—correlates highly with human listeners’ perceptions of discontinuity at concatenation points. In other words, the join cost should predict the degree of perceived discontinuity. We use this terminology: a join cost is computed using a join cost function, which generally uses a distance measure on some parameterization of the speech signal.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Subjective evaluation of join cost functions used in unit selection speech synthesis

In our previous papers, we have proposed join cost functions derived from spectral distances, which have good correlations with perceptual scores obtained for a range of concatenation discontinuities. To further validate their ability to predict concatenation discontinuities, we have chosen the best three spectral distances and evaluated them subjectively in a listening test. The unit sequences...

متن کامل

Kalman-filter based Join Cost for Unit

We introduce a new method for computing join cost in unitselection speech synthesis which uses a linear dynamical model (also known as a Kalman filter) to model line spectral frequency trajectories. The model uses an underlying subspace in which it makes smooth, continuous trajectories. This subspace can be seen as an analogy for underlying articulator movement. Once trained, the model can be u...

متن کامل

Perfect Synthesis for All of the People All of the Time

The quality of speech synthesis has drastically improved over the last ten years. Or at least it appears that this is the case. We have moved from diphones to unit selection. However, although we can produce much more natural sounding examples we have also given up an certain amount of control over what can be synthesized. We have reached the stage where playing a few examples to a non-expert c...

متن کامل

Symbolic vs. acoustics-based style control for expressive unit selection

The present paper addresses the issue of flexibility in expressive unit selection speech synthesis by using different style selection techniques. We select units from a mixed-style unit selection database, using either forced style switching, no control, symbolic target cost, or acoustic target cost as a style selection criterion. We assess the effect of selection technique, feature weight and ...

متن کامل

On the role of spectral dynamics in unit selection speech synthesis

Cost functions employed in unit selection significantly influence the quality of speech output. Although unit selection can produce very natural sounding speech the quality can be inconsistent and is difficult to guarantee due to discontinuities between incompatible units. The join cost employed in unit selection to measure the suitability of concatenating speech units typically consists of sub...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004